This document describes and demonstrates analysis methods used to analyze mouse hippocampal subregion-specific transcriptome RNA-seq data (Farris et al manuscript in preparation.)
Raw sequence read data is provided through GEO GSE116343, with RNA-seq data available at GEO GSE116342. Data used for analysis in R is provided through an R package farrisdata, installed in R with devtools::install_github("jmw86069/farrisdata").
The input data for this analysis workflow is derived from Salmon quantitation files, which were already imported and assembled into relevant data matrices.
Salmon quantitation files were imported using the R Bioconductor package tximport, summarized to the gene level, and stored in an R object farrisGeneSE, which is available in the R data package farrisdata, installed using devtools::install_github("jmw86069/farrisdata").
## Load the gene data
library(farrisdata);
library(SummarizedExperiment);
farrisGeneSE;
## class: SummarizedExperiment
## dim: 49341 24
## metadata(4): design contrasts genes samples
## assays(2): counts raw_counts
## rownames(49341): -343C11.2 00R_AC107638.2 ... n-TSaga9 n-TStga1
## rowData names(4): probeID ProbeName GeneName geneSymbol
## colnames(24): CA2CB492 CA2CB496 ... DGDE496 DGDE502
## colData names(4): CellType Compartment AnimalID groupName
farrisGeneSE is a SummarizedExperiment object with this content:
assays(farrisGeneSE)[["counts"]]
assays also contains the raw Salmon pseudocounts accessible, using this command: assays(farrisGeneSE)[["raw_counts"]]
rowData data.frame with gene rows, using the gene symbol derived from the Gencode GTF file.
as.data.frame(head(rowData(farrisGeneSE), 10)) %>%
mutate() %>%
select(probeID, geneSymbol, ends_with("_type"),
contains("has")) %>%
kable(escape=FALSE) %>%
kable_styling()
| probeID | geneSymbol |
|---|---|
| -343C11.2 | -343C11.2 |
| 00R_AC107638.2 | 00R_AC107638.2 |
| 00R_Pgap2 | 00R_Pgap2 |
| 0610005C13Rik | 0610005C13Rik |
| 0610006L08Rik | 0610006L08Rik |
| 0610007P14Rik | 0610007P14Rik |
| 0610009B22Rik | 0610009B22Rik |
| 0610009E02Rik | 0610009E02Rik |
| 0610009L18Rik | 0610009L18Rik |
| 0610009O20Rik | 0610009O20Rik |
| CellType | Compartment | AnimalID | groupName | |
|---|---|---|---|---|
| CA2CB492 | CA2 | CB | 492 | CA2_CB |
| CA2CB496 | CA2 | CB | 496 | |
| CA2CB502 | CA2 | CB | 502 | |
| CA2DE492 | CA2 | DE | 492 | CA2_DE |
| CA2DE496 | CA2 | DE | 496 | |
| CA2DE502 | CA2 | DE | 502 | |
| CA1CB492 | CA1 | CB | 492 | CA1_CB |
| CA1CB496 | CA1 | CB | 496 | |
| CA1CB502 | CA1 | CB | 502 | |
| CA1DE492 | CA1 | DE | 492 | CA1_DE |
| CA1DE496 | CA1 | DE | 496 | |
| CA1DE502 | CA1 | DE | 502 | |
| CA3CB492 | CA3 | CB | 492 | CA3_CB |
| CA3CB496 | CA3 | CB | 496 | |
| CA3CB502 | CA3 | CB | 502 | |
| CA3DE492 | CA3 | DE | 492 | CA3_DE |
| CA3DE496 | CA3 | DE | 496 | |
| CA3DE502 | CA3 | DE | 502 | |
| DGCB492 | DG | CB | 492 | DG_CB |
| DGCB496 | DG | CB | 496 | |
| DGCB502 | DG | CB | 502 | |
| DGDE492 | DG | DE | 492 | DG_DE |
| DGDE496 | DG | DE | 496 | |
| DGDE502 | DG | DE | 502 |
limma package.limma.Salmon quantitation files were imported using tximport as above, maintaining individual isoform abundances, then stored in an R object farrisTxSE, which is available in the R data package farrisdata, installed using devtools::install_github("jmw86069/farrisdata").
## Load the gene data
farrisTxSE;
## class: SummarizedExperiment
## dim: 122733 24
## metadata(4): design contrasts genes samples
## assays(4): tpm counts raw_tpm raw_counts
## rownames(122733): ENSMUST00000193812.1 ENSMUST00000082908.1 ...
## ENSMUST00000189352.1 ENSMUST00000179623.1
## rowData names(10): transcript_id geneSymbol ... TxHasCDS
## TxDetectedByTPM
## colnames(24): CA1CB492 CA1CB496 ... DGDE496 DGDE502
## colData names(4): CellType Compartment AnimalID groupName
The farrisTxSE object is a named list with content similar to that described above for gene data:
tx_df <- subset(as.data.frame(rowData(farrisTxSE)),
geneSymbol %in% c("Gria1","Shank2","Ntrk2")) %>%
select(geneSymbol,
transcript_id,
ends_with("_type"),
contains("has"),
contains("tpm"));
tx_df <- mixedSortDF(tx_df,
byCols=match(c("geneSymbol","TxDetectedByTPM"),
colnames(tx_df))*c(1,-1));
colorSubGene <- c(colorSub,
group2colors(unique(tx_df$geneSymbol),
cRange=c(20,30),
lRange=c(80,90)));
tx_df2 <- kable_coloring(tx_df,
colorSub=colorSubGene,
row_color_by="geneSymbol",
verbose=FALSE,
returnType="kable") %>%
collapse_rows(columns=2, valign="middle") %>%
row_spec(0, background="#DDDDDD")
tx_df2;
| geneSymbol | transcript_id | gene_type | transcript_type | has3UTR | TxHas3UTR | TxHasExt3UTR | GeneHasExt3UTR | TxHasCDS | TxDetectedByTPM | |
|---|---|---|---|---|---|---|---|---|---|---|
| 15377 | Gria1 | ENSMUST00000036315.15 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
| 15378 | ENSMUST00000094179.10 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | |
| 15376 | ENSMUST00000125292.1 | protein_coding | protein_coding | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | |
| 15379 | ENSMUST00000151045.2 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | |
| 15380 | ENSMUST00000173531.1 | protein_coding | processed_transcript | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | |
| 26395 | Ntrk2 | ENSMUST00000109838.8 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
| 26396 | ENSMUST00000079828.5 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | |
| 100990 | Shank2 | ENSMUST00000105900.8 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
| 100991 | ENSMUST00000146006.2 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | |
| 100986 | ENSMUST00000105902.7 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | |
| 100988 | ENSMUST00000213146.1 | protein_coding | protein_coding | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | |
| 100992 | ENSMUST00000097929.3 | protein_coding | protein_coding | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | |
| 100994 | ENSMUST00000136979.1 | protein_coding | processed_transcript | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE |
farrisGeneSE above.limma package.limma.bgaExprs <- assays(farrisGeneSE)$counts;
bgaExprs[bgaExprs < 7] <- 7;
bgaExprsUse <- bgaExprs[rowMins(bgaExprs) < rowMaxs(bgaExprs),,drop=FALSE];
nrow(bgaExprsUse);
## [1] 18005
salmonBga1 <- bga(dataset=bgaExprsUse,
classvec=nameVector(colData(farrisGeneSE)$groupName,
colnames(farrisGeneSE)),
type="coa");
salmonBga1ly <- bgaPlotly3d(salmonBga1,
axes=c(1,2,3),
colorSub=colorSub,
useScaledCoords=FALSE,
drawVectors="none",
drawSampleLabels=FALSE,
superGroups=gsub("_.+", "", salmonBga1$fac),
ellipseType="none",
sceneX=0, sceneY=1, sceneZ=1,
verbose=FALSE);
salmonBga1ly;
# sys.source("/Users/wardjm/Projects/R-scripts/bga_functions.R", envir=bga_env, keep.source=TRUE);
# attach(bga_env);detach(pos=tail(igrep("bga_env", search()), -1));
First we calculate group medians using gene expression data, to determine the set of genes detected above an expression threshold of 128 pseudocounts (log2 = 7).
farrisGeneGroupMedians <- rowGroupMeans(assays(farrisGeneSE)[["counts"]],
groups=colData(farrisGeneSE)$groupName,
useMedian=TRUE);
# Plot histogram of expression in CA1_CB and CA1_DE
plotPolygonDensity(farrisGeneGroupMedians[,c("CA1_CB","CA1_DE")],
xlim=c(0,20),
ylim=c(0,700),
ablineV=7);
## Warning in hist.default(x, breaks = breaks, col = barCol, main = main,
## border = histBorder, : arguments 'col', 'border', 'main', 'xlim', 'ylab',
## '...' are not made use of
## Warning: In density.default(x, width = width, weight = rep(weightFactor,
## length.out = length(x)), ...) :
## extra argument 'xlim' will be disregarded
## Warning in rep(weightFactor, length.out = length(x)): 'x' is NULL so the
## result will be NULL
## Warning in hist.default(x, breaks = breaks, col = barCol, main = main,
## border = histBorder, : arguments 'col', 'border', 'main', 'xlim', 'ylab',
## '...' are not made use of
## Warning: In density.default(x, width = width, weight = rep(weightFactor,
## length.out = length(x)), ...) :
## extra argument 'xlim' will be disregarded
## Warning in rep(weightFactor, length.out = length(x)): 'x' is NULL so the
## result will be NULL
# Detected genes in CA1_CB and CA1_DE
CA1detected <- (farrisGeneGroupMedians[,"CA1_CB"] > 7 &
farrisGeneGroupMedians[,"CA1_DE"] > 7);
CA1detCBandDE <- rownames(farrisGeneGroupMedians)[CA1detected];
length(CA1detCBandDE);
## [1] 10877
Next we load previously gene lists representing detected CA1 dendrite genes from published studies from Nakayama et al, Ainsley et al, and Cajigas et al. These genes are available in the farrisdata package, as described above.
Produce a 4-way Venn diagram showing the overlapping genes from these studies.
GeneListL <- list(
Farris=CA1detCBandDE,
Cajigas=CajigasGenes,
Ainsley=AinsleyGenes,
Nakayama=NakayamaGenes);
GeneListIM <- list2im(GeneListL);
vps <- limma::vennDiagram(GeneListIM,
circle.col=rainbowJam(4));
Per-sample correlation, centered across all CB samples. First, we use Salmon normalized pseudocounts, restricted to genes where at least one group mean value is above log2(7), which is >= 128 normalized pseudocounts.
Then data is centered per Compartment, so that CellBody samples are centered by subtracting the mean CellBody expression per gene, and Dendrite samples are centered by subtracting the mean Dendrite expression per gene. This step uses centerGeneData() with the argument centerGroups.
Next we prepare a data.frame with color coding to show the CellType and Compartment values.
## farrisGeneGroupMedians
## Pull out only cell body samples
iSamples <- colnames(farrisGeneSE);
iSamplesCB <- vigrep("CB", iSamples);
iSamplesDE <- vigrep("DE", iSamples);
#iSamplesGrp <- colnames(iMatrix7grp);
iSamplesGrp <- colnames(farrisGeneGroupMedians);
iSamplesGrpCB <- vigrep("CB", iSamplesGrp);
iSamplesGrpDE <- vigrep("DE", iSamplesGrp);
corrCutoff <- 7;
genesAboveCutoff <- (rowMaxs(farrisGeneGroupMedians) >= corrCutoff);
genesAboveCutoffCB <- (rowMaxs(farrisGeneGroupMedians[,iSamplesGrpCB]) >= corrCutoff);
genesAboveCutoffDE <- (rowMaxs(farrisGeneGroupMedians[,iSamplesGrpDE]) >= corrCutoff);
genesAboveCutoffBoth <- (genesAboveCutoffCB & genesAboveCutoffDE);
iMatrix7 <- assays(farrisGeneSE[genesAboveCutoff,])[["counts"]];
centerGroups <- gsub("^.+(DE|CB).+$",
"\\1",
iSamples);
iMatrix7ctr <- centerGeneData(iMatrix7,
centerGroups=centerGroups);
iMatrix7ctrCor <- cor(iMatrix7ctr);
## Generate some color bars to annotate the heatmap
colDataColorsRepL <- as.data.frame(colData(farrisGeneSE)[,c("CellType","Compartment")]);
colDataColorsRep <- df2colorSub(colDataColorsRepL, colorSub=colorSub);
We use pheatmap() (not yet using heatmap.3()) to draw the heatmap.
pheat_colors <- list(Compartment=colorSub[c("CB","DE")],
CellType=colorSub[c("CA1","CA2","CA3","DG")]);
pheat_breaks <- warpAroundZero(seq(from=-1, to=1, length.out=51), lens=1);
pheatmap(iMatrix7ctrCor[iSamplesCB,iSamplesCB],
breaks=pheat_breaks,
border_color=NA,
#display_numbers=TRUE, number_color="grey80",
#cutree_rows=4, cutree_cols=4,
annotation_names_col=TRUE,
color=getColorRamp("RdBu_r", n=50),
annotation_col=colDataColorsRepL[iSamplesCB,2:1],
annotation_row=colDataColorsRepL[iSamplesCB,2:1],
annotation_colors=pheat_colors
);
## Warning in brewer.pal(n, col): n too large, allowed maximum for palette RdBu is 11
## Returning the palette you asked for with that many colors
For sake of comparison, create the same heatmap using ComplexHeatmap
#require(ComplexHeatmap);
cBR <- circlize::colorRamp2(breaks=pheat_breaks,
col=getColorRamp("RdBu_r", n=51));
## Warning in brewer.pal(n, col): n too large, allowed maximum for palette RdBu is 11
## Returning the palette you asked for with that many colors
colHA <- HeatmapAnnotation(colDataColorsRepL[iSamplesCB,2:1],
show_annotation_name=TRUE,
col=pheat_colors);
rowA <- rowAnnotation(colDataColorsRepL[iSamplesCB,2:1],
col=pheat_colors,
show_annotation_name=TRUE,
show_legend=FALSE);
corHM <- Heatmap(iMatrix7ctrCor[iSamplesCB,iSamplesCB],
name="Correlation",
column_dend_height=unit(20, "mm"),
row_dend_width=unit(20, "mm"),
top_annotation=colHA,
col=cBR);
draw(rowA + corHM, row_dend_side="left");
if (1 == 2) {
## For now, commenting out custom heatmap function
par("family"="Arial", "lend"="square", "ljoin"="mitre");
heatmap.3(iMatrix7ctrCor[iSamplesCB,iSamplesCB],
margins=c(15,15),
trace="none",
tracecol="transparent",
col=colorRampPalette(rev(brewer.pal(15,"RdBu")))(51),
add.expr=box(),
cexCol=1.5,
cexRow=1.5,
keyLabel="Correlation",
colLensFactor=10,
distMethod="euclidean",
hclustMethod="ward",
doColorDendrograms=TRUE,
ColSideColors=t(colDataColorsRep[iSamplesCB,2:1]),
ColSideColorsNote=t(colDataColorsRepL[iSamplesCB,2:1]),
RowSideColors=(colDataColorsRep[iSamplesCB,2:1]),
RowSideColorsNote=(colDataColorsRepL[iSamplesCB,2:1]) );
}
Correlation showing CA2_CB correlates highest with CA2_DE, etc. for each CellType.
## farrisGeneGroupMedians[genesAboveCutoff,]
iMatrix7grp <- farrisGeneGroupMedians[genesAboveCutoffCB,];
centerGroupsGrp <- gsub("^.*(DE|CB).*$",
"\\1",
iSamplesGrp);
## 31oct2018 using mean instead of median
iMatrix7grpCtr <- centerGeneData(iMatrix7grp,
centerGroups=centerGroupsGrp,
mean=TRUE);
iMatrix7grpCtrCor <- cor(iMatrix7grpCtr);
## Generate some color bars to annotate the heatmap
colDataColorsGrpL <- data.frame(rbindList(
strsplit(nameVector(iSamplesGrp), "_")));
colnames(colDataColorsGrpL) <- c("CellType","Compartment");
colDataColorsGrp <- df2colorSub(colDataColorsGrpL,
colorSub=colorSub);
We use pheatmap() (not yet using heatmap.3()) to draw the heatmap.
pheatmap(iMatrix7grpCtrCor[iSamplesGrp,iSamplesGrp],
breaks=seq(from=-1, to=1, length.out=51),
border_color=NA,
#display_numbers=TRUE, number_color="grey20",
#cutree_rows=4, cutree_cols=4,
annotation_names_col=TRUE,
color=getColorRamp("RdBu_r", n=50),
annotation_col=colDataColorsGrpL[iSamplesGrp,2:1],
annotation_row=colDataColorsGrpL[iSamplesGrp,2:1],
annotation_colors=pheat_colors
);
## Warning in brewer.pal(n, col): n too large, allowed maximum for palette RdBu is 11
## Returning the palette you asked for with that many colors
if (1 == 2) {
par("family"="Arial", "lend"="square", "ljoin"="mitre");
heatmap.3(iMatrix7grpCtrCor[iSamplesGrp,iSamplesGrp],
margins=c(15,15),
trace="none",
tracecol="transparent",
hclCutoff=70,
cexColSideColorsNote=1.2,
cexRowSideColorsNote=1.2,
col=colorRampPalette(rev(brewer.pal(15,"RdBu")))(51),
add.expr=box(),
cexCol=1.5,
cexRow=1.5,
keyLabel="Correlation",
colLensFactor=10,
distMethod="euclidean",
hclustMethod="ward",
ColSideColors=t(colDataColorsGrp[iSamplesGrp,2:1]),
ColSideColorsNote=t(colDataColorsGrpL[iSamplesGrp,2:1]),
RowSideColors=(colDataColorsGrp[iSamplesGrp,2:1]),
RowSideColorsNote=(colDataColorsGrpL[iSamplesGrp,2:1]) );
}
ComplexHeatmap version:
colHA2 <- HeatmapAnnotation(colDataColorsGrpL[iSamplesGrp,2:1],
show_annotation_name=TRUE,
col=pheat_colors);
rowA2 <- rowAnnotation(colDataColorsGrpL[iSamplesGrp,2:1],
col=pheat_colors,
show_annotation_name=TRUE,
show_legend=FALSE);
corHM2 <- Heatmap(iMatrix7grpCtrCor[iSamplesGrp,iSamplesGrp],
name="Correlation",
column_dend_height=unit(20, "mm"),
row_dend_width=unit(20, "mm"),
top_annotation=colHA2,
col=cBR);
draw(rowA2 + corHM2, row_dend_side="left");
Scatterplot showing the +1,+1 selection of genes, which helps define the gene lists in the next section.
corCols <- c("CA2_DE","CA2_CB");
cutCB <- round(digits=3, log2(1.5));
cutDE <- round(digits=3, log2(1.5));
splomL <- lapply(nameVector(c("CA1","CA2","CA3","DG")), function(k) {
corCols <- vigrep(k, colnames(iMatrix7grpCtr));
df1 <- as.data.frame(iMatrix7grpCtr[,corCols]);
CBhit <- (abs(df1[,1]) > cutCB) * sign(df1[,1]);
DEhit <- (abs(df1[,2]) > cutCB) * sign(df1[,2]);
CBhitN <- paste0(k, "_", "CBhit");
DEhitN <- paste0(k, "_", "DEhit");
df1[,CBhitN] <- CBhit;
df1[,DEhitN] <- DEhit;
df1[,"GeneName"] <- rownames(df1);
df1;
});
#table(splomL[[1]][,3:4])
splomLsub <- lapply(splomL, function(iDF){
subset(iDF, !iDF[,3] %in% 0 | !iDF[,4] %in% 0)
});
splomDF2 <- rbindList(lapply(splomLsub, function(iDF){
iDF[,"CellType"] <- gsub("_.+", "", colnames(iDF)[1]);
colnames(iDF) <- c("CB","DE","CBhit","DEhit","GeneName","CellType");
iDF;
}));
#splomDF2[,"row"] <- ifelse(splomDF2$CellType %in% c("CA1","CA2"), "CA1", "CA3");
splomDF2[,"CBDEhit"] <- paste0(splomDF2$CBhit,":",splomDF2$DEhit);
## Color-code the splom points
ccColors <- nameVector(rep("grey", 8), unique(splomDF2$CBDEhit));
ccColors[c("1:1","-1:-1")] <- "orangered3";
ccColors[c("-1:1","1:-1")] <- "grey";
## Gene labels
GeneHighlight <- c("Plch2","Rgs14","Necab2","1700024P16Rik");
splomDF2$label <- ifelse(splomDF2$GeneName %in% GeneHighlight,
splomDF2$GeneName, "");
splomDF2[,"CBmaxGroupMean"] <- rowMaxs(farrisGeneGroupMedians[splomDF2$GeneName,iSamplesGrpCB]);
splomDF2[,"DEmaxGroupMean"] <- rowMaxs(farrisGeneGroupMedians[splomDF2$GeneName,iSamplesGrpDE]);
splomDF2[,"CBdet7"] <- (splomDF2[,"CBmaxGroupMean"] > 7)+0;
splomDF2[,"DEdet7"] <- (splomDF2[,"DEmaxGroupMean"] > 7)+0;
gg2 <- ggplot(splomDF2,
aes(x=CB, y=DE, color=CBDEhit, fill=CBDEhit, GeneName=GeneName)) +
geom_point(shape=21, size=2) +
geom_vline(xintercept=c(-1,1)*0.585,
linetype="dashed", color="grey20") +
geom_hline(yintercept=c(-1,1)*0.585, linetype="dashed", color="grey20") +
facet_wrap(facets=~CellType) +
theme_jam() +
scale_fill_manual(values=ccColors) +
scale_color_manual(values=makeColorDarker(ccColors)) +
ggrepel::geom_label_repel(aes(label=label),
force=8,
fill="white") +
theme(legend.position="none");
gg2;
Microarray correlation heatmap.
(Description of the four gene/transcript lists)
## anyGenes1 is the full set of all genes with abundance > 1
anyGenes1 <- rownames(farrisGeneGroupMedians)[rowMaxs(farrisGeneGroupMedians) > 1];
## Filtering rules for CB and DE
corFilterRuleCB <- (rowMaxs(farrisGeneGroupMedians[,iSamplesGrpCB]) >= 7);
corFilterRuleDE <- (rowMaxs(farrisGeneGroupMedians[,iSamplesGrpDE]) >= 7);
corFilterRule <- (corFilterRuleCB);
DEgenes7 <- rownames(farrisGeneGroupMedians)[corFilterRuleDE];
CBgenes7 <- rownames(farrisGeneGroupMedians)[corFilterRuleCB];
DEgenes7nonCB7 <- setdiff(DEgenes7, CBgenes7); # non-pyramidal
CBgenes7nonDE7 <- setdiff(CBgenes7, DEgenes7); # "non-DE genes"
DEgenes7andCB7 <- intersect(DEgenes7, CBgenes7); # "DE genes"
## Define neuro gene lists
geneListsAll <- list(anyGenes1=anyGenes1,
DEandCB=DEgenes7andCB7, # detected
CBonly=CBgenes7nonDE7, # detected only CB, not detected DE
`non-pyramidal`=DEgenes7nonCB7,
`CB1andDE1`=unique(subset(splomDF2,
CBDEhit %in% "1:1" &
CBdet7 %in% c(1) &
DEdet7 %in% c(1))$GeneName)
);
#lengths(geneListsAll);
geneListsDF <- data.frame(GeneList=names(geneListsAll),
GeneCount=format(lengths(geneListsAll), big.mark=",", trim=TRUE));
rownames(geneListsDF) <- NULL;
geneListsDF2 <- kable_coloring(geneListsDF,
colorSub=group2colors(names(geneListsAll)),
row_color_by="GeneList",
returnType="kable") %>%
row_spec(0, background="#DDDDDD")
geneListsDF2;
| GeneList | GeneCount |
|---|---|
| anyGenes1 | 28,075 |
| DEandCB | 12,265 |
| CBonly | 548 |
| non-pyramidal | 1,871 |
| CB1andDE1 | 1,055 |
Heatmap per-gene.
We refer to the function defineDetectedTx() in the jampack R package. The function takes normalized counts, normalized TPM values, sample group information, and several cutoff values:
#refreshFunctions("farrisSalmonWWS");
#detectedTxTPML <- try(defineDetectedTx(
detectedTxTPML <- defineDetectedTx(
iMatrixTx=assays(farrisTxSE)[["counts"]],
iMatrixTxTPM=assays(farrisTxSE)[["tpm"]],
groups=colData(farrisTxSE)$groupName,
cutoffTxPctMax=10,
cutoffTxExpr=32,
cutoffTxTPMExpr=2,
tx2geneDF=renameColumn(rowData(farrisTxSE),
from=c("geneSymbol","probeID"),
to=c("gene_name","transcript_id")),
useMedian=FALSE,
verbose=FALSE);
## ## (16:51:57) 31Oct2018: shrinkMatrix(): Duration for data.table DT creation: 0.03014302 secs
## ## (16:51:57) 31Oct2018: shrinkMatrix(): Duration for data.table shrinkMatrix: 0.06284189 secs
detectedTx <- detectedTxTPML$detectedTx;
numDetectedTx <- length(detectedTx);
detectedGenes <- mixedSort(unique(rowData(farrisTxSE[detectedTx,])$geneSymbol));
numDetectedGenes <- length(detectedGenes);
The code above defined 27,759 detected transcripts by the given criteria, covering 13,951 unique gene symbols.
e.g. protein_coding, etc.
We imported Gencode vM12 comprehensive GTF into R using R Bioconductor GenomicFeatures package, with which we derived 3’UTR regions, using GenomicFeatures::makeTxDbFromGFF() and GenomicFeatures::threeUTRsByTranscript(), respectively.
We plotted 3’UTR lengths as violin plots using only detected transcripts based upon TPM criteria described elsewhere.
Used DESeq-normalized expression values.
Creating gmtT format, refreshing gene symbols in the gmtT data, and the RNAseq gene hit data to be fully in sync.
Commentary on alt gene symbols, conversion to official mouse Entrez gene symbols.
Wrapper around standard hypergeometric enrichment tests, with a custom background of detected genes.
Subset pathways for top 15 per source-category
Heatmap of enrichment P-values.
Cnet plot